Labelling strategies for hierarchical multi-label classification techniques
نویسندگان
چکیده
Many hierarchical multi-label classification systems predict a real valued score for every (instance, class) couple, with a higher score reflecting more confidence that the instance belongs to that class. These classifiers leave the conversion of these scores to an actual label set to the user, who applies a cut-off value to the scores. The predictive performance of these classifiers is usually evaluated using threshold independent measures like precision-recall curves. However, several applications require actual label sets, and thus an automatic labelling strategy. In this paper, we present and evaluate different alternatives to perform the actual labelling in hierarchical multi-label classification. We investigate the selection of both single and multiple thresholds. Despite the existence of multiple threshold selection strategies in non-hierarchical multi-label classification, they cannot be applied directly to the hierarchical context. The proposed strategies are implemented within two main approaches: optimisation of a certain performance measure of interest (such as F-measure or hierarchical loss), and simulating training set properties (such as class distribution or label cardinality) in the predictions. We assess the performance of the proposed labelling schemes on 10 datasets from different application domains. Our results show that selecting multiple thresholds may result in an efficient and effective solution for hierarchical multi-label problems. & 2016 Elsevier Ltd. All rights reserved.
منابع مشابه
Exploiting Associations between Class Labels in Multi-label Classification
Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...
متن کاملHierarchical Core Vector Machines for Network Intrusion Detection
For labelling network intrusions as they state hierarchical multi-label structure, we develop a hierarchical core vector machines (HCVM) algorithm for high-speed network intrusion detection via hierarchical multi-label classification of network data. HCVM models a multi-label hierarchy into a data Hyper-Sphere constructed by numbers of core vector machines (CVM). As the CVMs in an HCVM are sepa...
متن کاملMulti-label large margin hierarchical perceptron
This paper looks into classification of documents that have hierarchical labels and are not restricted to a single label. Previous work in hierarchical classification focuses on the hierarchical perceptron (Hieron) algorithm. Hieron only supports single label learning. We investigate applying several standard multi-label learning techniques to Hieron. We then propose an extension of the algorit...
متن کاملMLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection
Multi-label classification has gained significant attention during recent years, due to the increasing number of modern applications associated with multi-label data. Despite its short life, different approaches have been presented to solve the task of multi-label classification. LIFT is a multi-label classifier which utilizes a new strategy to multi-label learning by leveraging label-specific ...
متن کاملImmuno-gold Labelling of Chlamydia trachomatis
Background Chlamydia trachomatis is considered as an important cause of preventable sexually transmitted diseases worldwide. It is known to be of an obligate intracellular nature and enters its target cells via an endocytic process. As major outer membrane protein (MOMP) is one of the main candidates for the attachment and entry of chlamydia to the host cells we have tried to label the epitopes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 56 شماره
صفحات -
تاریخ انتشار 2016